Insights

Insights derived from analysis

Viz and Insights

In this section, we will employ appropriate visually driven data analysis techniques to answer the questions in the challenge.

Qn 1

Using just the credit and loyalty card data, identify the most popular locations, and when they are popular. What anomalies do you see? What corrections would you recommend to correct these anomalies?

1.1 Location popularity

Transaction volume for each location is analyzed using the loyalty and credit card transaction information provided. We also parsed the date information from timestamp in credit card data so as to align the comparison with loyalty data, where only date information is provided.

Based on the interactive charts, we noted that the top 3 most popular locations by transaction volumes are:

Loyalty card transaction volume

  1. Katerina’s Cafe- Most popular on 01/11/2014 with 19 transactions, followed by 01/16/2014 (18) and 01/14/2014 (17).
  2. Hippokampos- Most popular on 01/08/2014 with 14 transactions, followed by 01/06/2014, 01/12/2014, 01/13/2014, 01/16/2014 and 01/17/2014 tied in second position with 13 transactions each day, and 01/15/2014 with 11 transactions.
  3. Guy’s Gyros- Most popular on 01/13/2014 with 15 transactions, followed by 01/15/2014 (14) and 01/09/2014, 01/16/2014 and 01/17/2014 being the third most popular days with 13 transactions.
## Volume from loyalty card dataset by date- linked

loyalty_data_cleaned_grpcount <- loyalty_data_cleaned %>%
    group_by(location, date) %>%
    summarise(txn_count = n())

shared_data <- SharedData$new(loyalty_data_cleaned_grpcount,
    key = ~location)

loyalty_loc_vol_bar <- shared_data %>%
    plot_ly() %>%
    group_by(location) %>%
    summarise(txn_count = sum(txn_count)) %>%
    arrange(desc(txn_count)) %>%
    add_bars(x = ~txn_count, y = ~location, type = "bar",
        hoverinfo = "text", text = ~paste("Location:", location,
            "<br>Transaction Volume:", txn_count)) %>%
    layout(title = "Loyalty card transaction Volume by Location",
        xaxis = list(title = "Transaction Volume", zeroline = FALSE),
        yaxis = list(title = "Location", zeroline = FALSE,
            categoryorder = "array", dtick = 1, categoryarray = ~txn_count))

loyalty_loc_vol_line <- shared_data %>%
    plot_ly() %>%
    add_lines(x = ~date, y = ~txn_count, hoverinfo = "text",
        text = ~paste("Location:", location, "<br>Date:",
            date, "<br>Transaction Volume:", txn_count)) %>%
    layout(title = "Loyalty card transaction Volume by Location across dates",
        xaxis = list(title = "Date", showgrid = TRUE, showticklabels = TRUE,
            type = "date", tickangle = 0, tickformat = "%d%b"),
        yaxis = list(title = "Transaction volume"))

bscols(widths = c(12, 12), loyalty_loc_vol_bar, loyalty_loc_vol_line)

Credit card transaction volume

  1. Katerina’s Cafe- Most popular on 01/18/2014 with 20 transactions, followed by 01/06/2014 (19) and 01/11/2014 (18).
  2. Hippokampos- Most popular on 01/16/2014 with 17 transactions, followed by 01/06/2014 (16), and 01/09/2014 with 14 transactions.
  3. Guy’s Gyros- Most popular on 01/09/2014 and 01/13/2014 with 15 transactions, followed by 01/10/2014 and 01/17/2014 with 14 transactions each day, and 01/15/2014 (13) and 01/16/2014 (13) being the third most popular days by transaction volume.
## Volume from credit card dataset by date- linked

cc_data_cleaned_grpcount <- cc_data_cleaned %>%
    group_by(location, date) %>%
    summarise(txn_count = n())

shared_data_cc <- SharedData$new(cc_data_cleaned_grpcount,
    key = ~location)

cc_data_loc_vol_bar <- shared_data_cc %>%
    plot_ly() %>%
    group_by(location) %>%
    summarise(txn_count = sum(txn_count)) %>%
    arrange(desc(txn_count)) %>%
    add_bars(x = ~txn_count, y = ~location, type = "bar",
        hoverinfo = "text", text = ~paste("Location:", location,
            "<br>Transaction Volume:", txn_count)) %>%
    layout(title = "Credit card transaction Volume by Location",
        xaxis = list(title = "Transaction Volume", zeroline = FALSE),
        yaxis = list(title = "Location", zeroline = FALSE,
            categoryorder = "array", dtick = 1, categoryarray = ~txn_count))

cc_loc_vol_line <- shared_data_cc %>%
    plot_ly() %>%
    add_lines(x = ~date, y = ~txn_count, hoverinfo = "text",
        text = ~paste("Location:", location, "<br>Date:",
            date, "<br>Transaction Volume:", txn_count)) %>%
    layout(title = "Credit card transaction Volume by Location across dates",
        xaxis = list(title = "Date", showgrid = TRUE, showticklabels = TRUE,
            type = "date", tickangle = 0, tickformat = "%d%b"),
        yaxis = list(title = "Transaction volume"))

bscols(widths = c(12, 12), cc_data_loc_vol_bar, cc_loc_vol_line)

Just by comparing the top 3 locations and their popular days, we noted that there are differences in the transaction count per day for each location on the loyalty card and credit card. In particular, there were days where loyalty card transactions were higher than credit card transactions. This is unexpected as loyalty card is used to collect discounts and rewards and cannot be used for payment. Hence one would expect both volumes to either be the same or for credit card volumes (actual purchase) to be higher than loyalty card volumes (in cases where the employee may have forgotten to present loyalty card for rewards/ discounts). The difference in volumes each day across both cards are illustrated below. The darker the blue hue, the greater the loyalty data transaction volume exceeds credit card transaction volume.

Some of the notable differences observed include: 1) Kronos Mart- Loyalty card volume exceeded credit card volume by 3 transactions on 01/18/2014 2) Katerina’s Cafe- Credit card volume exceeded loyalty card volume by 5 transactions on 01/18/2014 3) Brew’ve Been Served- Credit card volume exceeded loyalty card volume by 4 transactions on 01/09/2014

## Difference in volume by day

loyalty_data_count <- loyalty_data_cleaned %>%
    group_by(location, date) %>%
    summarise(txn_count = n()) %>%
    arrange(desc(txn_count))

cc_data_count <- cc_data_cleaned %>%
    group_by(location, date) %>%
    summarise(txn_count = n()) %>%
    arrange(desc(txn_count))

count_loyalty_cc_comb <- loyalty_data_count %>%
    full_join(cc_data_count, by = c("location",
        "date")) %>%
    rename(loyalty_card_txncount = "txn_count.x",
        credit_card_txncount = "txn_count.y") %>%
    mutate_if(is.numeric, replace_na, replace = 0) %>%
    mutate(difference_vol = loyalty_card_txncount -
        credit_card_txncount)

a <- list(title = "Difference in volume per day by location<br>(loyalty card vol - credit card vol)",
    showticklabels = TRUE, dtick = 1)

txn_vol_difference_heatmap <- count_loyalty_cc_comb %>%
    plot_ly(x = ~date, y = ~location, z = ~difference_vol,
        colors = brewer.pal(3, "Blues"),
        type = "heatmap", hoverinfo = "text",
        text = ~paste("Date:", date, "<br> Location:",
            location, "<br> Vol_Diference:",
            difference_vol))
txn_vol_difference_heatmap %>%
    layout(title = "Difference in transaction volume by Location across dates",
        yaxis = a, xaxis = list(dtick = "86400000.0",
            type = "date", title = "Date",
            tickangle = 0, tickformat = "%d%b"),
        margin = m, plot_bgcolor = "#bdbdbd")

Figure 1: Heatmap showing difference in transaction volume per location

We will then analyze the transaction volume by day of week to observe volume trends across the week.

Loyalty card transaction volume

  1. Katerina’s Cafe- Most popular on Saturdays with 34 transactions, followed by Tuesdays (33) and Thursdays (32).
  2. Hippokampos- Most popular on Mondays with 26 transactions, followed by Wednesdays and Thursdays (25) and Fridays (22).
  3. Guy’s Gyros- Most popular on Mondays and Thursdays with 26 transactions, followed by Fridays (25) and Wednesdays (23).
## Loyalty card volume by day
## of week

## Preparing the data
loyalty_data_cleaned$daynumber = lubridate::wday(loyalty_data_cleaned$date,
    week_start = 1)
loyalty_data_cleaned$weekday = factor(loyalty_data_cleaned$weekday,
    levels = c("Sunday", "Monday",
        "Tuesday", "Wednesday",
        "Thursday", "Friday", "Saturday"))



## Plots
shared_data_loyalty = SharedData$new(loyalty_data_cleaned,
    key = ~location)

loyalty_wkday_vol <- shared_data_loyalty %>%
    plot_ly(width = 700, height = 500) %>%
    group_by(weekday, location) %>%
    summarize(transaction = n()) %>%
    arrange(desc(transaction)) %>%
    add_trace(x = ~weekday, y = ~transaction,
        color = ~location, type = "bar",
        hoverinfo = "text", hovertext = ~paste("<br> Day_of_week:",
            weekday, "<br> Transaction Volume:",
            transaction)) %>%
    layout(title = "Loyalty card transaction volume across day of week",
        xaxis = list(title = "Day_of_week",
            type = "factor", categoryorder = "factor",
            categoryarray = loyalty_data_cleaned$weekday[order(loyalty_data_cleaned[["daynumber"]])],
            showgrid = TRUE, showticklabels = TRUE),
        yaxis = list(title = "Transaction volume"))

bscols(widths = c(10, 4), list(filter_select("location",
    "Please Specify the location",
    shared_data_loyalty, group = ~location,
    multiple = F), loyalty_wkday_vol,
    datatable(shared_data_loyalty) %>%
        formatDate(1, method = "toLocaleDateString")))

Given that we are provided with credit card timestamp information, we will take one step further to analyze the volume of credit card transactions by location and time.

Credit card transaction volume

  1. Katerina’s Cafe- Most popular on Saturdays with 38 transactions, followed by Mondays and Tuesdays (34) and Thursdays (32). Usually popular around dinner time (1700-2000) across all days in the week.
  2. Hippokampos- Most popular on Thursdays with 31 transactions, followed by Mondays (28) and Wednesdays (27). Usually popular around lunch time (1300-1600), except for weekends where it is more popular during dinner hours (1700-2000).
  3. Guy’s Gyros- Most popular on Thursdays and Fridays with 28 transactions, followed by Mondays (26) and Tuesdays (25) and Wednesdays (25). Usually popular around dinner time (1700-2000), except for Fridays where it is more popular during lunch hours (1300-1600).
## Credit card volume by day
## of week Preparing the data
cc_data_cleaned$daynumber = lubridate::wday(cc_data_cleaned$timestamp,
    week_start = 1)
cc_data_cleaned$weekday = factor(cc_data_cleaned$weekday,
    levels = c("Sunday", "Monday",
        "Tuesday", "Wednesday",
        "Thursday", "Friday", "Saturday"))

## Preparing timegroup
cc_data_cleaned$timegroup = ""
cc_data_cleaned$timegroup[cc_data_cleaned$hour <
    5 & 0 <= cc_data_cleaned$hour] = "00 - 04"
cc_data_cleaned$timegroup[cc_data_cleaned$hour <
    9 & 5 <= cc_data_cleaned$hour] = "05 - 08"
cc_data_cleaned$timegroup[cc_data_cleaned$hour <
    13 & 9 <= cc_data_cleaned$hour] = "09 - 12"
cc_data_cleaned$timegroup[cc_data_cleaned$hour <
    17 & 13 <= cc_data_cleaned$hour] = "13 - 16"
cc_data_cleaned$timegroup[cc_data_cleaned$hour <
    21 & 17 <= cc_data_cleaned$hour] = "17 - 20"
cc_data_cleaned$timegroup[cc_data_cleaned$hour <=
    24 & 21 <= cc_data_cleaned$hour] = "21 - 24"

## Plots
shared_data = SharedData$new(cc_data_cleaned,
    key = ~location)

cc_wkday_vol <- shared_data %>%
    plot_ly(width = 700, height = 500) %>%
    group_by(weekday, location) %>%
    summarize(transaction = n()) %>%
    arrange(desc(transaction)) %>%
    add_trace(x = ~weekday, y = ~transaction,
        color = ~location, type = "bar",
        hoverinfo = "text", hovertext = ~paste("<br> Day_of_week:",
            weekday, "<br> Transaction Volume:",
            transaction)) %>%
    layout(title = "Credit card transaction volume across day of week",
        xaxis = list(title = "Day_of_week",
            type = "factor", categoryorder = "factor",
            categoryarray = cc_data_cleaned$weekday[order(cc_data_cleaned[["daynumber"]])],
            showgrid = TRUE, showticklabels = TRUE),
        yaxis = list(title = "Transaction volume"))

cc_timegroup_vol <- shared_data %>%
    plot_ly(width = 700, height = 500) %>%
    group_by(weekday, timegroup,
        location) %>%
    summarize(transaction = n()) %>%
    add_trace(x = ~weekday, y = ~transaction,
        color = ~timegroup, type = "bar",
        hoverinfo = "text", hovertext = ~paste("<br> Time:",
            timegroup, "<br>Transaction Volume:",
            transaction)) %>%
    layout(title = "Credit card transaction volume at each timegroup across day of week",
        xaxis = list(title = "Time",
            type = "factor", categoryorder = "factor",
            categoryarray = cc_data_cleaned$weekday[order(cc_data_cleaned[["daynumber"]])],
            showgrid = TRUE, showticklabels = TRUE),
        yaxis = list(title = "Transaction volume"))



bscols(widths = c(10, 4), list(filter_select("location",
    "Please Specify the location",
    shared_data, group = ~location,
    multiple = F), cc_wkday_vol,
    cc_timegroup_vol, datatable(shared_data) %>%
        formatDate(1, method = "toLocaleDateString")))

As observed above, looking at just the top three most popular locations, the popular day of week differs based on the information derived from both cards. This is unexpected as we would expect the trends to be similar for both loyalty and credit cards as loyalty card is used to collect discounts and rewards and cannot be used for payment. Hence one would expect both volumes to either be the same or for credit card volumes (actual purchase) to be higher than loyalty card volumes (in cases where the employee may have forgotten to present loyalty card for rewards/ discounts).

Despite the differences in the credit card and loyalty card transaction volumes noted, we will not be making any adjustments at this stage to either dataset since we are unable to determine which dataset is accurate. We will be mindful in using either dataset for further analysis.

1.2 Anomalies

Other than the above, other anomalies noted include:

  1. Timestamp data in credit card dataset was provided in the datetime format while timestamp data in loyalty card dataset was in the date format. This made it harder to compare between both datasets. As discussed above, in order to overcome this, I parsed the date information from datetime data provided in the credit card dataset so as to align the information with that in the loyalty card dataset for comparability.
glimpse(loyalty_data)
Rows: 1,392
Columns: 4
$ timestamp  <dttm> 2014-01-06, 2014-01-06, 2014-01-06, 2014-01-06, ~
$ location   <chr> "Brew've Been Served", "Brew've Been Served", "Ha~
$ price      <dbl> 4.17, 9.60, 16.53, 11.51, 12.93, 4.27, 11.20, 15.~
$ loyaltynum <chr> "L2247", "L9406", "L8328", "L6417", "L1107", "L40~
glimpse(cc_data)
Rows: 1,490
Columns: 4
$ timestamp  <dttm> 2014-01-06 07:28:00, 2014-01-06 07:34:00, 2014-0~
$ location   <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price      <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <dbl> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
  1. Based on the timestamp of credit card transactions, we noted that all of the transactions in “Bean There Done That”, “Brewed Awakenings”, “Coffee Shack” and “Jack’s Magical Beans” were made at 12:00pm. It is highly unlikely that all transactions in these locations are transacted at the same time. Hence, the timestamp for these transactions may be incorrect.
    Given that these timestamps may not be representative of the actual transaction time, we will be mindful of using this information for further analysis subsequently.
cc_data_incorrect_time <- cc_data_cleaned %>%
    filter(location == "Bean There Done That" | location ==
        "Brewed Awakenings" | location == "Coffee Shack" |
        location == "Jack's Magical Beans")

cc_data_incorrect_time$timestamp <- as.character(cc_data_incorrect_time$timestamp)

cc_data_incorrect_time = subset(cc_data_incorrect_time,
    select = -c(timegroup, day, hour, daynumber, weekday))

DT::datatable(cc_data_incorrect_time, options = list(pageLength = 10,
    width = "100%"))
  1. Furthermore, we also noted that there are several transactions in Kronos Mart at odd hours (0300-0359) on 13 January and 19 January. This is highly unusual and warrants further investigation. We will be mindful of using this information for further analysis subsequently.
cc_data_kronos <- cc_data_cleaned %>%
    filter(location == "Kronos Mart")

cc_data_kronos$timestamp <- as.character(cc_data_kronos$timestamp)

cc_data_kronos = subset(cc_data_kronos, select = -c(timegroup,
    day, hour, daynumber, weekday))

DT::datatable(cc_data_kronos, options = list(pageLength = 10,
    width = "100%"))
  1. As seen in the datatables below, there are 55 credit cards but only 54 loyalty cards transaction information provided. This is unusual as all employees are provided with a loyalty card. This discrepancy may arise because the employee does not want his/ her transactions to be tracked and hence is avoiding using the loyalty card, or that employees are using more than 1 credit card for their purchases. Further investigation should be conducted to find out.
## loyalty card
distinct_loyalty <- loyalty_data_cleaned %>%
    group_by(loyaltynum) %>%
    summarize(total_spent = sum(price)) %>%
    arrange(desc(total_spent))

DT::datatable(distinct_loyalty, options = list(columnDefs = list(list(className = "dt-center",
    targets = 0:2, width = "20%"))))
## credit card
distinct_cc <- cc_data_cleaned %>%
    group_by(last4ccnum) %>%
    summarize(total_spent = sum(price)) %>%
    arrange(desc(total_spent))

DT::datatable(distinct_cc, options = list(columnDefs = list(list(className = "dt-center",
    targets = 0:2, width = "20%"))))

Qn 2

Add the vehicle data to your analysis of the credit and loyalty card data. How does your assessment of the anomalies in question 1 change based on this new data? What discrepancies between vehicle, credit, and loyalty card data do you find?

2.1 Discrepancies

  1. Assuming that the car assignment list provided includes all employees, we noted that there are 44 distinct employees. However, we noted that there are 55 distinct credit card numbers and 54 distinct loyalty card numbers. This is unusual as each employee should have been issued a loyalty card and hence we would expect number of distinct credit cards, loyalty cards and employee count to match.
    More investigation should be made into this discrepancy. Possible reaons could be that employees could have used more than one credit card with their loyalty card. Another possibility could be that there is a new employee who has not received the loyalty card.
    Given that the distinct employee count is significantly fewer than the number of distinct loyalty and credit cards, we should check with Gastech if there are any employees missing from this list. We should not expect the number of distinct employees and distinct loyalty cards to differ significantly as it does not make sense to issue an employee with multiple loyalty cards.
## distinct employee names
distinct_empname <- location_carid_join %>%
    group_by(FirstName, LastName) %>%
    summarize(mean = mean(hour)) %>%
    drop_na(FirstName)

distinct_empname <- subset(distinct_empname,
    select = -c(mean))

DT::datatable(distinct_empname, filter = "top",
    options = list(columnDefs = list(list(className = "dt-center",
        targets = 0:2, autoWidth = TRUE,
        scrollX = TRUE))))

Figure 2: Distinct employee names

## distinct loyalty cards
DT::datatable(distinct_loyalty, options = list(columnDefs = list(list(className = "dt-center",
    targets = 0:2, width = "20%"))))

Figure 3: Distinct loyalty card numbers

## distinct credit cards
DT::datatable(distinct_cc, options = list(columnDefs = list(list(className = "dt-center",
    targets = 0:2, width = "20%"))))

Figure 4: Distinct credit card numbers

  1. From the car assignment dataset provided, we observe that there are nine truck drivers with no ID. This is consistent with what Gastech has explained, which is that employees who do not have company cars have the ability to check out company trucks for business use, but these trucks cannot be used for personal business.
    The case scenario does not state which CarIDs are referring to trucks. However, assuming that the 3 digit CarID represents trucks, we only note GPS data for five trucks. There is no evidence as to whether the truck ID is sequential or if each truck driver is assigned to a truck. Given that there are 9 truck drivers and only 5 truck GPS data provided, there is possibility that:

    1. Each truck driver is not assigned to a unique truck and trucks can be shared.

    2. GPS information for 4 other trucks are missing in the GPS dataset.

To perform further investigation on this, we will plot the GPS paths of each carID over the Abila map to identify their route.

## Employees with no cars assigned
missing_carid <- location_carid_join %>%
    group_by(id) %>%
    filter(is.na(id))
missing_carid = subset(missing_carid, select = -c(lat, long,
    date, day, hour))

DT::datatable(missing_carid, filter = "top", options = list(autoWidth = TRUE,
    scrollX = TRUE))

Figure 5: Employees with no assigned car

## Cars not assigned to employees
missing_empname <- location_carid_join %>%
    group_by(id, FirstName, LastName) %>%
    filter(is.na(FirstName) | is.na(LastName)) %>%
    select(id) %>%
    unique()

DT::datatable(missing_empname, options = list(columnDefs = list(list(className = "dt-center",
    targets = 0:3, width = "25%"))))

Figure 6: CarID with no assigned employee

2.2 GeoVisual Analysis

We will first perform georeferencing using the SHP files provided.

Georeferencing

  1. Download and launch QGIS, an open-sourced GIS software.

  2. Start a new project by clicking on Project > New.

Figure 1: Start new project
  1. Add a vector layer. Navigate to Layer > Add Layer > Add Vector Layer
Figure 2: Add vector layer
  1. Click on the “…” button and navigate to location of Abila.shp file. Click on Add. You should see Figure 4 in your main pane.
Figure 3: Select SHP file
Figure 4: Current map
  1. To make the map clearer, we will then change the lines from green to black. Right click on Abila under the layers panel > select Properties > select Symbology. click on the arrow dropdown next to Color and select black > Click Apply and Ok. Your screen should resemble Figure 6 below.
Figure 5: Select Properties
Figure 6: Change line color
  1. To perform the georeferencing, click on Raster on the top pane > Georeferencer. When the new window appears, click on the blue square symbol on the top left to access the image file. Navigate to the MC2- tourist JPG file. In order to perform georeferencing, we need to select reference points (at least 6 control points) from the tourist map to match to the corresponding points on the SHP file map.
Figure 7: Perform georeferencing
Figure 8: Click on blue symbol
  1. To match the corresponding points, click on the identify tool on the top pane (Figure 9). Hover over the suspected corresponding point and click on it. The Identify Results pane appears on the right corner (Figure 10). Observe the results by matching the road name specified on the Identify Results pane to the tourist map.
Figure 9: Use the identify function to match corresponding points
Figure 10: Observe results to ensure correct point has been selected
  1. Once checked to be correct, under the Georeferencer, click on the selected point on the tourist map. On the Enter Map Coordinates window that appears, click on From Map Canvas and hover over the Shp file map. Ensure the crosshair is as close as possible to the actual point (Figure 11). By clicking on the point, this captures the X and Y coordinates (Figure 12). Click Ok. The GCP table in the Georeferencer window will be updated with the first reference point (Figure 13). Repeat the above steps for the other cross-reference points.
Figure 11: Select point on the tourist map
Figure 12: Update map coordinates
Figure 13: GCP table
  1. To check the settings, select Settings > Transformation Settings. Select the following settings as specified in Figure 15. Note that if the Target SRS is not set to WGS 84, click on the globe symbol next to the field and type “4326” under the filter pane. Select WGS 84 when it appears as the filtered result (Figure 16). Under output settings, ensure results are saved in a TIF file format for usage subsequently. Click on the tickbox next to “Save GCP points” and “Load in QGIS when done”. Click ok.
Figure 14: GCP table
Figure 15: Update settings
Figure 16: Update Target SRS
  1. Select file on the top pane > select Start Georeferencing. Once georeferencing is completed, minimise the georeferencer pane and switch back to the map.
Figure 17: Start georeferencing
  1. Under the layers pane on the left, drag the image file below the Abila streetmap so that the streetmap can be plotted on top of the tourist map (Figure 18). Doing a check, we observe that the streetmap is well-aligned with the image file. The TIF file created can then be used in RStudio as a raster object.
Figure 18: Rearrange layers
Figure 19: Georeferenced Map
  1. After preparing the georeferencing, we will then import the raster layer into RStudio.
## Import raster layer
bgmap <- raster("data/Geospatial/MC2-tourist.tif")
bgmap
class      : RasterLayer 
band       : 1  (of  3  bands)
dimensions : 1595, 2706, 4316070  (nrow, ncol, ncell)
resolution : 3.16216e-05, 3.16216e-05  (x, y)
extent     : 24.82419, 24.90976, 36.04499, 36.09543  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs 
source     : MC2-tourist.tif 
names      : MC2.tourist 
values     : 0, 255  (min, max)
tmap_mode("plot")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255)

  1. Using st_read() of sf package, import Abila shapefile into R. We will then convert aspatial data to simple feature dataframe.
## Import line data
Abila_st <- st_read(dsn = "data/Geospatial", layer = "Abila")
Reading layer `Abila' from data source 
  `D:\stellaloh91\Assignment\data\Geospatial' using driver `ESRI Shapefile'
Simple feature collection with 3290 features and 9 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 24.82401 ymin: 36.04502 xmax: 24.90997 ymax: 36.09492
Geodetic CRS:  WGS 84
## Converting gps data to Simple Feature
## Data Frame
gps_data <- gps_data %>%
    mutate(day = get_day(gps_data$Timestamp),
        date = as_date(gps_data$Timestamp), minute = get_minute(gps_data$Timestamp),
        day_of_week = weekdays(gps_data$Timestamp),
        hour = get_hour(gps_data$Timestamp))

gps_data$timegroup = cut(gps_data$hour, c(0, 4,
    8, 12, 16, 20, 24))
levels(gps_data$timegroup) = c("0-4", "5-8", "8-12",
    "13-16", "17-20", "21-24")
gps_data
# A tibble: 685,169 x 10
   Timestamp           id      lat  long   day date       minute
   <dttm>              <fct> <dbl> <dbl> <int> <date>      <int>
 1 2014-01-06 06:28:01 35     36.1  24.9     6 2014-01-06     28
 2 2014-01-06 06:28:01 35     36.1  24.9     6 2014-01-06     28
 3 2014-01-06 06:28:03 35     36.1  24.9     6 2014-01-06     28
 4 2014-01-06 06:28:05 35     36.1  24.9     6 2014-01-06     28
 5 2014-01-06 06:28:06 35     36.1  24.9     6 2014-01-06     28
 6 2014-01-06 06:28:07 35     36.1  24.9     6 2014-01-06     28
 7 2014-01-06 06:28:09 35     36.1  24.9     6 2014-01-06     28
 8 2014-01-06 06:28:10 35     36.1  24.9     6 2014-01-06     28
 9 2014-01-06 06:28:11 35     36.1  24.9     6 2014-01-06     28
10 2014-01-06 06:28:12 35     36.1  24.9     6 2014-01-06     28
# ... with 685,159 more rows, and 3 more variables:
#   day_of_week <chr>, hour <int>, timegroup <fct>
gps_sf <- st_as_sf(gps_data, coords = c("long",
    "lat"), crs = 4326)
  1. Next, join the gps points into movement paths by using the drivers’ IDs as unique identifiers.
## Creating movement path from GPS points Group by need to
## come with summarize--> in this case we summarize using
## mean timestamp but this value is not needed
gps_path <- gps_sf %>%
    group_by(id, day, timegroup, hour) %>%
    summarize(m = mean(Timestamp), do_union = FALSE) %>%
    st_cast("LINESTRING") %>%
    rename(timestamp = "m")
  1. Checking the data, we noticed single coordinates pair in the line feature. The following code chunk is to identify and remove the orphan lines.
##Remove one point linestrings
points = npts(gps_path, by_feature = TRUE)
gps_path <- cbind(gps_path, points)
gps_path_cleaned <-gps_path[!(gps_path$points== 1),]
  1. Lastly, we then overplot the selected gps path onto the background tourist map.
## Plotting GPS paths
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 1, day == 6)

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of CarID 1 on 01/06/2014")

By plotting the GPS coordinates using the Abila tourist map as background, we are able to visualize the path each vehicle is taking. The map is also interactive. Clicking on any point in the trajectory allows us to see the CarID, day and timestamp of the point in the route. This allows us to match the timestamp and location back to the credit card dataset, hence matching the credit card numbers to their corresponding CarID.

2.2.1 Anomalies

  1. After mapping the GPS trajectories, We also noted that there were no GPS data indicating that any car stopped near “Bean There Done That”, “Brewed Awakenings”, “Coffee Shack” and “Jack’s Magical Beans” during the period of transactions- 12:00pm. Hence, these transactions may be either incorrectly timed or may even be fraudulent.
## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(hour == 12)

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of all cars trajectories from 1200-1259")
  1. Furthermore as mentioned above, we also noted that there are several transactions in Kronos Mart at odd hours (0300-0359) on 13 January and 19 January. By filtering the map for 3am, we noted that there were no cars near Kronos Mart. Hence, these transactions may be either incorrectly timed or may even be fraudulent.
## Facet map
gps_path_selected <- gps_path_cleaned %>%
    filter(hour == 3)

tmap_mode("plot")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_facets(by = "id", ncol = 3)
Facet graph of all cars trajectories from 0300-0359

Figure 7: Facet graph of all cars trajectories from 0300-0359

Qn3

Can you infer the owners of each credit card and loyalty card? What is your evidence? Where are there uncertainties in your method? Where are there uncertainties in the data?

3.1 GeoVisual Analysis

3.1.1 Match CarID to credit card data

By plotting the GPS coordinates using the Abila tourist map as background, we are able to visualize the path each vehicle is using. The map is interactive- clicking on any point in the trajectory allows us to see the CarID, day and timestamp of the respective route. This allows us to match the timestamp and location back to the credit card dataset, hence matching the credit card numbers to their corresponding CarID. Alternatively, we can also use a facet map by hour of day to identify location of each car during each hour of the day.

By following the GPS coordinates of each carID, we are then able to identify the time that they arrive at certain points of interest in the map. With this information, we can match the points of interest to the credit card transactions at these locations, based on the timestamp on the GPS data and credit card data. After matching the unique CarID to the credit cards, we can then derive the owner of each card based on the car assignment data. Using a facet map allows us to visualize the route for each car during each hour across each of the 14 days of GPS data in record. This allows for easy observation and matching to the credit card data.

We have used an interactive map below to visualize the route for CarID 1 across 01/06/2014. This allows for easy observation and matching to the credit card data.

## Plotting GPS paths- interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 1, day == 6)

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of CarID 1 on 01/06/2014")

In this case, we can see that the day begins at 0720 where Car 1 drives towards Hallowed Grounds and stops here for 35 minutes. It then leaves Hallowed Grounds at 0757 and reaches Gastech at 0804. At 1217, Car 1 drives towards Albert’s Fine Clothes, arriving at 1226. It departs at 1325 back to Gastech and arrives at 1334. At 1744, Car 1 leaves Gastech toward the area near Hallowed Grounds, arriving at 1748. At 1936, Car 1 leaves the area near Hallowed Grounds for an area near Albert’s Fine Clothes and arrives at 1949. It then departs at 2027 back to the area near Hallowed Grounds, arriving at 2033. At 2211, Car 1 once again departs the area near Hallowed Grounds for Gastech, arriving at 2215, and only returns to the area near Hallowed Grounds at 0100 on 01/07/2014, arriving at 0114.

By visualizing the route, we can filter the below credit card table for the relevant day, hour and location. The datatable below allows for multiple filters and this allows us to narrow down the list of credit cards which fits the criteria across the day. By repeating this process across the 14 days credit card transaction and location data provided, we will be able to match the credit card to CarID.

In this case, by filtering the below table for Hallowed Grounds on day 6, hour 7, we noted that credit card number ending with 9551 is on the list of transactions. Repeating this process for the locations visited throughout the 2 weeks period, we have matched Car 1 to credit card number ending with 9551.

cc_data_cleaned_min <- subset(cc_data_cleaned, select = -c(daynumber,
    timestamp))
DT::datatable(cc_data_cleaned_min, filter = "top", fillContainer = T,
    options = list(scrollX = TRUE, autoWidth = TRUE, scroller = TRUE,
        scrollY = "450px"))

Based on interactive filtering and visualization of the maps above, we were able to match the GPS trajectories to the credit card data provided, based on location and timestamp. In the process, we noted that there car trajectories match more than 1 credit card match. This is seen for employees in the facilities department (carID: 29, 100 to 107). We also noted some credit cards without a CarID match.

We have imported the match via a CSV file below.

##Import match
cc_carid_match <- read_csv("data/cc_carid_match.csv")

cc_carid_match$CarID = as.factor(cc_carid_match$CarID)
cc_carid_match$last4ccnum = as.factor(cc_carid_match$last4ccnum )
glimpse(cc_carid_match )
Rows: 55
Columns: 2
$ last4ccnum <fct> 1286, 1310, 1321, 1415, 1874, 1877, 2142, 2276, 2~
$ CarID      <fct> 22, 26, 11, 2, 14, 9, 25, 106, NA, 35, 7, NA, 23,~

Using the above dataset, we will be able to identify the owners of each credit card by matching the carID to the car assignment dataset. Scanning through the table, we noted that Ovan Bertrand (Facilities Group Manager) is assigned CarID 29 and uses 2 different credit cards (1- ending with 3547, 2- ending with 5921).

In the car assignment dataset provided, truck drivers are not assigned to a particular truck number. Hence we are unable to identify which credit card belongs to which truck driver as there is no car assignment information provided. We have managed to match the trajectories of trucks to 9 different credit cards, which coincides with the number of truck drivers. Hence this reinforces our opinion that trucks are shared among the truck drivers. With the limited information, we are unable to match the credit card to truck driver, but only to the CarID.

## Non-truck drivers
cc_carid_emp_match_NT <- cc_carid_match %>%
    filter(CarID != 100, CarID != 101, CarID != 102, CarID !=
        103, CarID != 104, CarID != 105, CarID != 106, CarID !=
        107) %>%
    left_join(assignment_data, "CarID") %>%
    arrange(CarID)

DT::datatable(cc_carid_emp_match_NT, filter = "top", fillContainer = T,
    options = list(scrollX = TRUE, autoWidth = TRUE, scroller = TRUE,
        scrollY = "450px"))
## Truck drivers
cc_carid_emp_match_T <- cc_carid_match %>%
    filter(CarID == 100 | CarID == 101 | CarID == 102 |
        CarID == 103 | CarID == 104 | CarID == 105 | CarID ==
        106 | CarID == 107) %>%
    arrange(CarID)
cc_carid_emp_match_T
# A tibble: 9 x 2
  last4ccnum CarID
  <fct>      <fct>
1 3506       101  
2 9220       101  
3 9614       101  
4 8642       104  
5 7792       105  
6 2276       106  
7 9152       106  
8 4530       107  
9 9735       107  

The below lists credit cards without a CarID match.

## Credit cards not matched to CarID
cc_carid_no_match <- cc_carid_match %>%
    filter(is.na(CarID))
cc_carid_no_match
# A tibble: 10 x 2
   last4ccnum CarID
   <fct>      <fct>
 1 2418       <NA> 
 2 2681       <NA> 
 3 4948       <NA> 
 4 5368       <NA> 
 5 5407       <NA> 
 6 6691       <NA> 
 7 7117       <NA> 
 8 8129       <NA> 
 9 8202       <NA> 
10 8411       <NA> 

3.1.2 Match credit card to loyalty card

After matching the timestamp of credit card transactions to CarID using the trajectory visualization, we can then match the credit card and loyalty card owners, using full join on 3 criteria match- 1) Date, 2) Location and 3) Price.

## Match loyalty and credit cards
loyalty_cc_match <- cc_data_cleaned %>%
    full_join(loyalty_data_cleaned, by = c("date", "location",
        "price"))

loyalty_cc_match = subset(loyalty_cc_match, select = -c(day.y,
    daynumber.y, timestamp, weekday.y)) %>%
    rename(day = "day.x", daynumber = "daynumber.x", weekday = "weekday.x")

DT::datatable(loyalty_cc_match, extensions = "Scroller",
    fillContainer = T, filter = "top", options = list(scrollX = TRUE,
        autoWidth = TRUE, scroller = TRUE, scrollY = "450px")) %>%
    formatDate(4, method = "toLocaleDateString")

After the full join, we noted that there are some transactions whereby the amount recorded on the loyalty card and credit card is different, hence resulting in N/As in the joined dataset. To correct this, we will drop rows with N/As and perform a group to identify which loyalty cards are mapped to each credit card.

##Group by credit card number and loyalty card number
loyalty_cc_match_group <- loyalty_cc_match %>%
  group_by(last4ccnum, loyaltynum) %>%
  summarise(sumprice = sum(price)) %>%
  drop_na("last4ccnum", "loyaltynum")

##Check if each credit card is matched to each loyalty card
loyalty_cc_match_count <- loyalty_cc_match_group %>%
  group_by(last4ccnum) %>%
  summarise(loyalty_card_count = n()) %>%
  arrange(desc(loyalty_card_count))

loyalty_cc_match_count
# A tibble: 55 x 2
   last4ccnum loyalty_card_count
   <fct>                   <int>
 1 1286                        2
 2 4795                        2
 3 4948                        2
 4 5368                        2
 5 5921                        2
 6 7889                        2
 7 8332                        2
 8 1310                        1
 9 1321                        1
10 1415                        1
# ... with 45 more rows

Based on the above results, we noted that there are seven credit cards mapped to more than 1 loyalty card. This is likely to be a duplicate due to the method being used to match both cards. We will investigate each match individually to identify if its a true match or simply a coincidence. To differentiate the former from the latter, we will assess if there are several transactions made with the same combination of credit and loyalty card number. If there is only a single transaction made with the same combination, it is likely a coincidence. This can be done using the interactive datatable below.

DT::datatable(loyalty_cc_match, extensions = "Scroller",
    fillContainer = T, filter = "top", options = list(scrollX = TRUE,
        autoWidth = TRUE, scroller = TRUE, scrollY = "450px")) %>%
    formatDate(4, method = "toLocaleDateString")

After further investigation, it is noted that there are several transactions made with the same combination of credit and loyalty card number for the following combination:

- Credit card number ending with 1286, matched with loyalty card number L3288 and L3572 

Hence the above is unlikely to be a coincidence. Loyalty cards L3288 and L3572 are being used together with the same credit card.

We will then perform a check on whether each loyalty card is used with one or multiple credit cards. We see eight loyalty cards matched to multiple credit cards. There are two loyalty cards matched to multiple credit cards with more than one transaction with the respective credit and loyalty card combination.

##Check for loyalty cards matched to multiple credit card
cc_loyalty_match_count <- cc_data_cleaned %>%
  full_join(loyalty_data_cleaned, by = c("date", "location", "price")) %>%
  group_by(last4ccnum, loyaltynum) %>%
  summarise(txncount = n()) %>%
  drop_na("last4ccnum", "loyaltynum") %>%
  group_by(loyaltynum) %>%
  summarise(cc_count = n()) %>%
  arrange(desc(cc_count))

cc_loyalty_match_count
# A tibble: 54 x 2
   loyaltynum cc_count
   <chr>         <int>
 1 L2070             2
 2 L2247             2
 3 L3288             2
 4 L3295             2
 5 L6119             2
 6 L6267             2
 7 L8566             2
 8 L9406             2
 9 L1107             1
10 L1485             1
# ... with 44 more rows
##Check for loyalty cards matched to multiple credit card, with > 1 txn
cc_loyalty_match_obs <- cc_data_cleaned %>%
  full_join(loyalty_data_cleaned, by = c("date", "location", "price")) %>%
  group_by(last4ccnum, loyaltynum) %>%
  summarise(txncount = n()) %>%
  filter(txncount > 1) %>%
  drop_na("last4ccnum", "loyaltynum") %>%
  group_by(loyaltynum) %>%
  summarise(cc_count = n()) %>%
  arrange(desc(cc_count))

cc_loyalty_match_obs
# A tibble: 54 x 2
   loyaltynum cc_count
   <chr>         <int>
 1 L3288             2
 2 L6267             2
 3 L1107             1
 4 L1485             1
 5 L1682             1
 6 L2070             1
 7 L2169             1
 8 L2247             1
 9 L2343             1
10 L2459             1
# ... with 44 more rows

To prevent similar case of coincidence from the matching method above, we filter for combinations with more than 1 transaction using the interactive datatable below. From the table below, we observe that L3288 and L6267 are used with 2 different credit cards.

- Loyalty card number L3288, matched with credit card number ending with 1286 and 9241

- Loyalty card number L6267, matched with credit card number ending with 6691 and 6899
DT::datatable(loyalty_cc_match, extensions = "Scroller",
    fillContainer = T, filter = "top", options = list(scrollX = TRUE,
        autoWidth = TRUE, scroller = TRUE, scrollY = "450px")) %>%
    formatDate(4, method = "toLocaleDateString")

Given the assumption that the combination of credit and loyalty cards with only one transaction are due to coincidence, we will drop these coincidental duplicate matches.

loyalty_cc_match_cleaned <- loyalty_cc_match[!(loyalty_cc_match$last4ccnum ==
    "4795" & loyalty_cc_match$loyaltynum == "L2070" | loyalty_cc_match$last4ccnum ==
    "5921" & loyalty_cc_match$loyaltynum == "L9406" | loyalty_cc_match$last4ccnum ==
    "7889" & loyalty_cc_match$loyaltynum == "L2247" | loyalty_cc_match$last4ccnum ==
    "4948" & loyalty_cc_match$loyaltynum == "L3295" | loyalty_cc_match$last4ccnum ==
    "5368" & loyalty_cc_match$loyaltynum == "L6119" | loyalty_cc_match$last4ccnum ==
    "8332" & loyalty_cc_match$loyaltynum == "L8566"), ]

We then match the loyalty and credit card joined data to the car assignment data matched with credit card transactions.

cards_emp_match <- left_join(loyalty_cc_match_cleaned, cc_carid_match,
    by = "last4ccnum") %>%
    drop_na() %>%
    left_join(assignment_data, by = "CarID")

3.2 Uncertainties in method

  1. Assumption that employee will make a transaction when they visit a location. GPS coordinates only infer that the employee was at a certain area but does not mean that he/she has to make any purchases when at that location. Hence matching GPS data to transaction data alone may not be accurate in some instances where employee does not spend or makes purchases using cash instead of credit card, which will not reflect in the credit card transaction dataset.

  2. As the tourist map only shows certain tourist attractions and not all the locations in Abila, we are unable to match the full list of locations from trajectory as seen in the map view to the transactions occurring in locations which are not reflected in tourist map (e.g. Abila Zacharo). This may result in incorrect matching as it is more of a guesswork as not all locations in the credit card transactions can be mapped to locations visited in the GPS trajectory, as derived from the mapview.

  3. There are differences in the amount recorded on credit card and loyalty card for certain transactions, hence resulting in N/As in the joined dataset. We have removed the rows with N/As and instead worked with the joined data. However, this is based on the assumption that card transactions with same date, location and price relate to the same transaction. There may be instances of pure coincidence when this is not true. As there were instances of loyalty cards mapped to multiple credit cards and vice versa, We have assumed that should there be only one transaction with the same combination of credit and loyalty card, this is likely to be a coincidence and hence dropped from consideration.

  4. We use the GPS coordinates of the car and credit card transactions to determine the employee’s location. However, employees may not have used their assigned car, or may not have made a purchase using their own loyalty or credit card.

3.3 Uncertainties in data

  1. Assuming that the car assignment list provided includes all employees, we noted that there are 44 distinct employees. However, we noted that there are 55 distinct credit card numbers and 54 distinct loyalty card numbers. Hence, we are unable to map some of these credit and loyalty cards to a CarID. This may be because the car assignment data may not comprise of all employees.

  2. We noted that some of the timestamps provided in the credit card data seems inaccurate. The transaction data for “Bean There Done That”, “Brewed Awakenings”, “Coffee Shack” and “Jack’s Magical Beans” are all transacted on 12:00 time. Hence I suspect that this does not represent the actual time of the transaction. Therefore it is inaccurate to match GPS coordinates timestamp to the timestamp of these transactions. These transactions will not be used for the matching process.

  3. Background information did not specify how truck IDs are identified from the list of CarIDs. We have inferred these based on the difference in IDs, inferring that truck IDs are three digit IDs. It is better to obtain confirmation from Gastech on whether this is accurate.

  4. Truck drivers are not assigned a particular truck. Hence even though we are able to match the GPS trajectories of these trucks to certain credit cards, we are unable to match the credit and loyalty cards back to the employees, unless we are provided with information regarding which employee has checked out which truck at certain timings.

Qn4

Given the data sources provided, identify potential informal or unofficial relationships among GASTech personnel. Provide evidence for these relationships.

4.1 Network Visualisation

We will first create a network graph using the following code chunks.

## Prepare data
cc_carid_emp_match <- cc_carid_match %>%
    drop_na() %>%
    left_join(assignment_data, by = "CarID")

cc_carid_emp_match$LastName[is.na(cc_carid_emp_match$LastName)] <- "Truck Driver"
cc_carid_emp_match$FirstName[is.na(cc_carid_emp_match$FirstName)] <- "Truck Driver"
cc_carid_emp_match$CurrentEmploymentType[is.na(cc_carid_emp_match$CurrentEmploymentType)] <- "Facilities"
cc_carid_emp_match$CurrentEmploymentTitle[is.na(cc_carid_emp_match$CurrentEmploymentTitle)] <- "Truck Driver"

cc_data_cleaned_matched <- cc_data_cleaned %>%
    left_join(cc_carid_emp_match, by = "last4ccnum")
##Create nodes list
sources <- cc_data_cleaned_matched %>%
  distinct(last4ccnum) %>%
  rename(label = last4ccnum)

destinations <- cc_data_cleaned_matched %>%
  distinct(location) %>%
  rename(label = location)

##Create single df of unique users and locations
cc_nodes <- full_join(sources, destinations, by = "label")

##Add id column to nodes df
cc_nodes <- cc_nodes %>%
  rowid_to_column("id")

cc_nodes <- left_join(cc_nodes, cc_carid_emp_match, by = c("label" = "last4ccnum"))
cc_nodes <- cc_nodes %>%
  rename(group = CurrentEmploymentType)
##Create edges list
edges <- cc_data_cleaned_matched %>%
  group_by(last4ccnum, location, day, hour) %>%
  summarise(weight = n()) %>%
  ungroup()
edges
# A tibble: 1,490 x 5
   last4ccnum location              day  hour weight
   <fct>      <chr>               <int> <int>  <int>
 1 1286       Abila Zacharo           6    13      1
 2 1286       Abila Zacharo           9    13      1
 3 1286       Abila Zacharo          13    13      1
 4 1286       Abila Zacharo          16    13      1
 5 1286       Ahaggo Museum          18    14      1
 6 1286       Brew've Been Served     6     8      1
 7 1286       Brew've Been Served     7     7      1
 8 1286       Brew've Been Served     8     8      1
 9 1286       Brew've Been Served     9     8      1
10 1286       Brew've Been Served    10     8      1
# ... with 1,480 more rows
##Tidy edges list
cc_edges <- edges %>%
  left_join(cc_nodes, by = c("last4ccnum" = "label")) %>%
  rename(from = id)

cc_edges <- cc_edges %>%
  left_join(cc_nodes, by = c("location" = "label")) %>%
  rename(to = id) %>%
  mutate(timegroup = "")

cc_edges$timegroup[cc_edges$hour < 07 & 00 <= cc_edges$hour] = "00 - 06"
cc_edges$timegroup[cc_edges$hour < 10 & 07 <= cc_edges$hour] = "07 - 09"
cc_edges$timegroup[cc_edges$hour < 12 & 10 <= cc_edges$hour] = "10 - 11"
cc_edges$timegroup[cc_edges$hour < 15 & 12 <= cc_edges$hour] = "12 - 14"
cc_edges$timegroup[cc_edges$hour < 17 & 15 <= cc_edges$hour] = "15 - 16"
cc_edges$timegroup[cc_edges$hour < 22 & 17 <= cc_edges$hour] = "17 - 21"
cc_edges$timegroup[cc_edges$hour <= 24 & 22 <= cc_edges$hour] = "22 - 24"

##Reorder columns
cc_edges <- select(cc_edges, from, to, day, hour, timegroup, weight)
##Build network graph
cc_graph <- tbl_graph(nodes = cc_nodes,
                      edges = cc_edges,
                      directed = FALSE)

First, we can facet the network graph by day and timegroup. Users can update the cc_edges_selected to observe the facet graph for each day. The facet below is for day 01/06/2014. From the facet graph, we observe that the interactions mainly occur in the morning (0700-0900), lunc time (1200-1400) and after work (1700-2100).

##Facet network graph
cc_edges_selected <- cc_edges %>%
  filter(day == 6)

cc_graph <- tbl_graph(nodes = cc_nodes,
                      edges = cc_edges_selected,
                      directed = FALSE)

set_graph_style()
g <- ggraph(cc_graph, layout = "nicely") + 
  geom_edge_link(aes(width = weight), alpha=0.2) +
  scale_edge_width(range = c(0.1, 5)) +
  geom_node_point(aes(colour = group), size = 2) 

g + facet_edges(day~timegroup) +
th_foreground(foreground = "grey80",
border = TRUE) +
theme(legend.position = 'bottom')
Facet network graph by timegroup

Figure 8: Facet network graph by timegroup

By using the below interactive network graph together with the use of a datatable, we are able to visualize the interactions between parties and locations and use the datatable to filter for the relevant parties and locations to identify whether they were in the same location during the same period.

For example, filtering for Desafio Golf Course highlights the employees who have visited the location over the two week period, We note that they are in the Executives department. We will then filter the datatable below for Desafio Golf Course and noted that there were several executives who were in this location on the same date and time, hence indicating a probable interaction between these executives.

Figure: Select Desafio Golf Course on network graph
Figure: Filter for Desafio Golf Course in datatable
##Interactive viz

visNetwork(cc_nodes, cc_edges, height = "600px", width = "100%") %>%
  visIgraphLayout(layout = "layout_in_circle") %>%
  visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE, selectedBy = "group") %>%
  visLegend() %>%
  visGroups(groupname = "Security", color = "#FFFFA3") %>%
  visGroups(groupname = "Engineering", color = "#FFAFB1") %>%
  visGroups(groupname = "Information Technology", color = "#A1EC76") %>%
  visGroups(groupname = "Facilities", color = "#F0B3F5") %>%
  visGroups(groupname = "Executive", color = "#FF3333") %>%
  visLayout(randomSeed = 123)

Figure 9: Interactive network graph

cc_data_cleaned_matched <- subset(cc_data_cleaned_matched,
    select = -c(timestamp, day, price))
DT::datatable(cc_data_cleaned_matched, fillContainer = T,
    extensions = "Scroller", filter = "top", options = list(scrollX = TRUE,
        autoWidth = TRUE, scroller = TRUE, scrollY = "450px"))

4.2 Network Analysis

We have derived the following potential informal relationships between employees.

  1. Desafio Golf course- Several of the executives were congregated at the Desafio Golf Course on Sundays. Willem Vasco-Pais (Environmental Safety Advisor), Ingrid Barranco (SVP/CFO) and Ada Camp-Corrente (SVP/CIO) were present on 01/12/2014 at 1300-1459 and 01/19/2014 at 1200-1559. Sten Sanjorge Jr (President/CEO) and Orhan Strum (SVP/COO) joining during the second session.

  2. Chostus Hotel- We see Elsa Orilla (Drill Technician) and Brand Tempestad (Drill Technician) from Engineering department checking into Chostus Hotel frequently on weekdays during lunch hour, up to four times over the two weeks observed (on 01/08/2014. 01/10/2014. 01/14/2014 and 01/17/2014). On further investigation, there are two employees with last name Orilla. We do not have further information on whether there is a relationship between them. However if they are partners, this may present as an illicit relationship given that it occurs frequently during lunch hours on weekdays when Kare Orilla is having lunch elsewhere.

  3. Kronos Mart- Transactions noted on odd hours such as on 01/19/2014 at 0300-0359 where Varja Lagos (Badging Office), Nils Calixto (IT Helpdesk) and Ada Campo-Corrente (SVP/CIO) made purchases during the same hour.

  4. There were several locations only frequented by the Facilities department, such as Abila Airport, Kronos Pipe and Irrigation, Nationwide Refinery, Maximum Iron and Steel, Stewart and Sons Fabrication and Carlyle Chemical. Their visits are usually not at the same time.

  5. Coffee Shack is visited by only one person with last 4 credit card number 7117 during lunch hour, on most weekdays.

  6. Bean There Done That is popular with the Engineering team during lunch hour on most weekdays, with no transactions by this same group over the weekends.

  7. Brewed Awakenings is favoured by Ingrid Barranco (SVP/CFO), Elsa Orilla (Drill Technician) and Ada Campo-Corrente (SVP/CIO) during lunch hour on weekdays and they make purchases there frequently during the same periods, hence would probably have crossed paths here.

  8. Jack’s Magical Beans is favoured by Isak Baza (IT Technician), Isande Borrasca (Drill Technician), Orhan Strum (SVP/COO), Willem Vasco-Pais (Environmental Safety Advisor) during lunch hour on weekdays and they make purchases there frequently during the same periods, hence would probably have crossed paths here.

  9. From the car assignment information provided, we noted both Isia Vann (Perimeter Control) and Edvard Vann (Perimeter Control) have the same last name. We do not have further information on whether there is a relationship between them. However, on further investigation, they seem to have a close relationship given that there were multiple times where they made transactions in the same location during the same date and hour, such as the following:

    • On 01/07/2014, 01/13/2014, 01/14/2014, they made transactions at Guy’s Gyros at 2000-2059.
    • On 01/08/2014, 01/09/2014, 01/10/2014, 01/13/2014, 01/14/2014, 01/15/2014, 01/16/2014 and 01/17/2014, they made transactions at Brew’ve Been Served at 0700-0759.
    • On 01/14/2014, 01/19/2014, they made transactions at Katerina’s Cafe at 1300-1359.
  10. Both Birgitta Frente (Geologist) and Vira Frente (Hydraulic Technician) have the same last name. We do not have further information on whether there is a relationship between them. However on further investigation, they do not seem to have a particularly close relationship as there was only one time where they made transactions in the same location during the same date and hour- On 01/13/2014 at Bean There Done That at 1200-1259.

  11. There are other employees with the same last names such as Minke Mies and Henk Mies, Valeria Morlun and Adan Morlun, Claudio Hawelon and Benito Hawelon. However as some of them belong to the facilities truck drivers team, where due to the data limitations, we are unable to match each the CarID to Employee Name, we are unable to identify the routes of these truck drivers to determine if they were in the same location with the employees with similar last names.

Qn5

Do you see evidence of suspicious activity? Identify 1- 10 locations where you believe the suspicious activity is occurring, and why.

5.1 Suspicious Activities identified

We have identified the following suspicious activities:

  1. Unusual transactions and frequent overtime

Nils Calixto (IT Helpdesk) returned back to Gastech during late hours frequently. This is highly suspicious as the nature of his role does not seem to require returning to office during late nights when there are few employees at work.

From the facet graph below, we observe that he has returned to office during the period 2100-2400 on 01/06/2014, 01/08/2014, 01/15/2014 and 01/17/2014.

## Facet map- by day
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 1, timegroup == "21-24")

tmap_mode("plot")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive map of CarID 1 at time 2100-2359")

Filtering the datatable below, we also observe that Nils Calixto has made several suspicious transactions on his credit card on 01/13/2014:

cc_data_cleaned_matched_DT <- subset(cc_data_cleaned_matched,
    select = -c(daynumber))
DT::datatable(cc_data_cleaned_matched_DT, extensions = "Scroller",
    filter = "top", fillContainer = T, options = list(scrollX = TRUE,
        autoWidth = TRUE, scroller = TRUE, scrollY = "450px"))

Figure 10: Datatable of credit card transactions

## Plotting GPS paths- interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 1, day == 13, hour == 19)

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive map of CarID 1 on day 13, hour 1700-1759")
## Plotting GPS paths- interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 1, day == 13, hour == 13)

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive map of CarID 1 on day 13, timegroup 1300-1359")
  1. Gathering of employees

There seems to have been a gathering by certain employees on 01/10/2014 till late. Employees only started leaving at around 2300. We filter for the day and hour and plotted a facet map. By tracing the employees route with the interactive facet map below, we noted that except for Isia Vann (Perimeter Control)(CarID 16) and Willem Vasco-Pais (Environmental Safety Advisor, CarID 35), the other employees seem to leave the house in Carnero Street at around 2300 to return to their respective homes. We have excluded Isia Vann and Willem Vasco-Pais as the route is different from the others. We also noted that two other employees left the gathering later than the others- Felix Balas and Marin Onda.

The employees who attended the gathering seem to be mainly from the Engineering and Information Technology department. We noted that this area seems to be Lars Azada’s house and hence he may be the host. We have inferred that this is his house as by tracking the trajectories of his movement, we noted that he tends to return to this location during afterwork hours.

## Facet map- by id
gps_path_selected <- gps_path_cleaned %>%
    filter(day == 10, hour == 23 | hour == 22)

tmap_mode("plot")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_facets(by = "id", ncol = 3)
Facet graph of all cars moving at time 2200 - 2359 on day 10

Figure 11: Facet graph of all cars moving at time 2200 - 2359 on day 10

## Facet map- by id
gps_path_selected <- gps_path_cleaned %>%
    filter(day == 11, hour == 0)

tmap_mode("plot")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_facets(by = "id", ncol = 3)
Facet graph of all cars moving at time 0000-0059 on day 11

Figure 12: Facet graph of all cars moving at time 0000-0059 on day 11

  1. Monitoring of CIO’s house at odd hours

The following employees have been visiting a similar location near Spetson Park late at night. On further investigation, this seems to be the Ada Campo-Corrente’s (CIO) house. We have inferred that this is the CIO house by tracking the trajectories of her movement. We noted that she tends to return to this location during afterwork hours.

## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 10, hour == 17 | hour == 18)

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Facet graph of CarID 10 at 1700-1859 (afterwork hours)")
## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 15, day == 7)

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of CarID 15 on 01/07/2014")
## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 16, day == 7)

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of CarID 16 on 01/07/2014")
  1. Monitoring of COO’s house at odd hours

The following employees have been visiting a similar location near Taxichorn Park late at night. On further investigation, this seems to be the Orhan Strum’s (COO) house. We have inferred that this is the COO house by tracking the trajectories of his movement. We noted that he tends to return to this location during afterwork hours.

## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 32, hour == 18 | hour == 17)

tmap_mode("plot")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of CarID 32 at 1700-1800 (afterwork hours)")

## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 15, day == 9)

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of CarID 15 on 01/09/2014")
## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 24, day == 9, hour == 3)

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of CarID 24 on 01/09/2014 at 0300-0359")
  1. Driving around Abila without making any transactions

On 01/11/2014 night, Bertrand Ovan (Facilities Group Manager, CarID 29) visited a sequence of places, without making any transactions. He left his house at 22:11, arriving at Brew’ve Been Served Cafe at 2212. After staying for 9 minutes, he went to the Ouzeri Elian, arriving at 2227. He then left at 2234 and headed for the place near Kronos Mart and arrived at 22:40. 15 minutes later, he went to the Alberts Fine Clothing arriving at 2258 and stayed there for another 23 minutes. He drove to an area near U-Pump or Jack Magic Beans, starting from 2321 and arrived at the at 23:25. He stayed there for about 30 minutes, and arrived home at 2359. This is quite unusual as he visited several locations without making any transactions.

## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 29, day == 11, timegroup == "21-24")

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of CarID 29 on 01/11/2014, 2100-2359")
cc_data_cleaned_matched_DT <- subset(cc_data_cleaned_matched,
    select = -c(daynumber))
DT::datatable(cc_data_cleaned_matched_DT, extensions = "Scroller",
    filter = "top", fillContainer = T, options = list(scrollX = TRUE,
        autoWidth = TRUE, scroller = TRUE, scrollY = "450px"))

Figure 13: Datatable of credit card transactions

  1. Monitoring of CFO’s house at odd hours

The following employees have been visiting a similar location late at night. On further investigation, this seems to be the CFO’s house. We have inferred that this is the Ingrid Barranco’s (CFO) house as by tracking the trajectories of her movement, we noted that she tends to return to this location during afterwork hours.

## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 4, hour == 18 | hour == 17)

tmap_mode("plot")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_facets(by = "day", ncol = 3) +
    tm_layout(title = "Interactive graph of of CarID 4 at 1700-1859 (afterwork hours)")

## Facet map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 16, day == 10, hour == 23)

t1 <- tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red")

t2 <- tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red")

current.mode <- tmap_mode("plot")
tmap_arrange(t1, t2)
Graph of CarID16 at time 2300-2359 on day 10 and 0300-0359 on day 11

Figure 14: Graph of CarID16 at time 2300-2359 on day 10 and 0300-0359 on day 11

tmap_mode(current.mode)
## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 24, day == 14)

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of CarID 24 on 01/14/2014")
## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 21, day == 11)

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of CarID 21 on 01/11/2014")
gps_path_selected1 <- gps_path_cleaned %>%
    filter(id == 21, day == 13, hour == 23)

m1 <- tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255) + tm_shape(gps_path_selected1) +
    tm_lines(col = "red")

gps_path_selected2 <- gps_path_cleaned %>%
    filter(id == 21, day == 14, hour == 3)

m2 <- tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255) + tm_shape(gps_path_selected2) +
    tm_lines(col = "red")

current.mode <- tmap_mode("plot")
tmap_arrange(m1, m2)
Graph of CarID 21 on 01/13/2014 and 01/14/2014

Figure 15: Graph of CarID 21 on 01/13/2014 and 01/14/2014

tmap_mode(current.mode)
  1. Hennie Osvaldo’s multiple residences

Hennie Osavado seems to own two residences-

  1. Area next to Guy’s Gyros,

  2. Area near Frydo’s Autosupply N’More.

He typically returns to the second house on weekdays and the first house on weekends, with the exception of 01/08/2014 and 01/15/2014.

## Facet map- by day
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 21, timegroup == "21-24" | timegroup == "17-20")

tmap_mode("plot")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_facets(by = "day", ncol = 3)
Facet graph of CarID 21 on from 1700-2400

Figure 16: Facet graph of CarID 21 on from 1700-2400

  1. Potential relationship between Elsa Orilla and Brand Tempestad

As discussed above, we also noted that Elsa Orilla (Drill Technician, CarID 7) and Brand Tempestad (Drill Technician, carID 33) frequently visited Chostus Hotel on weekdays during lunch hour, up to four times over the two weeks observed (01/08/2014, 01/10/2014, 01/14/2014, 01/17/2014). This can be seen from their credit card transactions using the datatable below.

On further investigation, there are two employees with last name Orilla. We do not have further information on whether there is a relationship between them. However if they are partners, this may present as an illicit relationship given that it occurs frequently during lunch hours on weekdays when Kare Orilla is having lunch elsewhere. We also noticed that both parties tend to stagger their departure time from the location for about 10 minutes.

## Facet map- by id
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 7, hour == 12 | hour == 13 | hour == 14)

tmap_mode("plot")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_facets(by = "day", ncol = 3)
Facet graph of CarID 21 during 1200-1400

Figure 17: Facet graph of CarID 21 during 1200-1400

## Facet map- by id
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 33, hour == 12 | hour == 13 | hour == 14)

tmap_mode("plot")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_facets(by = "day", ncol = 3)
Facet graph of CarID 33 during 1200-1400

Figure 18: Facet graph of CarID 33 during 1200-1400

cc_data_cleaned_matched_DT <- subset(cc_data_cleaned_matched,
    select = -c(daynumber))
DT::datatable(cc_data_cleaned_matched_DT, extensions = "Scroller",
    filter = "top", fillContainer = T, options = list(scrollX = TRUE,
        autoWidth = TRUE, scroller = TRUE, scrollY = "450px"))

Figure 19: Datatable of credit card transactions

  1. Repeated visits to locations not near eateries

Hennie Osvaldo (Perimeter Control, CarID 21), Minke Mies (Perimeter Control, CarID 24), Inga Ferro (Site Control, CarID 13) and Loreto Bodrogi (Site Control, CarID 15) usually take turns visiting a selected few locations during the lunch break (except for 01/06/2014, 01/12/2014 and 01/19/2014).

These locations include

  1. Area near Frank’s Fuels,

  2. Area west of Bean There Done That,

  3. Area south of Hallowed Grounds,

  4. Area south of Katerina’s Cafe and

  5. Area south-west of the Arkadious Park.

These locations are not near most of the usual dining locations and some tend to be quite out of the way. Hence this has been raised as suspicious behavior.

## Facet map- by id
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 21 | id == 24 | id == 13 | id == 15, hour ==
        12 | hour == 13)

tmap_mode("plot")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_facets(by = "day", ncol = 3)
Facet graph of the various car trajectories during 1200-1300

Figure 20: Facet graph of the various car trajectories during 1200-1300

  1. Suspicious transactions and activities at petrol stations

Given that there are relatively few records at gas stations (U-Pump and Frank’s Fuels), we assume that GasTech provides free petrol for their employee cars. Scanning through the few records at gas stations using the datatable below, we note the following:

- Varja Lagos (Badging Office, CarID 23) made a transaction at U-Pump on 01/06/2014 at 1728.
- Nils Calixto (IT Helpdesk, CarID 1) made a transaction at U-Pump on 01/13/2014 at 1318.
- Loreto Bodrogi (Site Control, CarID 15) made a transaction at Frank's Fuel on 01/08/2014 at 1229.
- Felix Balas (Engineer, CarID 3) made a transaction at Frank's Fuel on 01/18/2014 at 1839.
cc_data_cleaned_matched_DT <- subset(cc_data_cleaned_matched,
    select = -c(daynumber))
DT::datatable(cc_data_cleaned_matched_DT, extensions = "Scroller",
    filter = "top", fillContainer = T, options = list(scrollX = TRUE,
        autoWidth = TRUE, scroller = TRUE, scrollY = "450px"))

Figure 21: Datatable of credit card transactions

However, we noted that Nils Calixto (CarID 1) GPS coordinates put him at an area near Albert’s Fine Clothes instead of U-Pump when the credit card transaction was made.

Furthermore, we noted that Minke Mies (Perimeter Control, CarID 24) did not have transactions recorded at U-Pump, but was in an area near U-Pump on 01/13/2014 from 1235 to 1322.

Lastly, Bertrand Ovan (Facilities Group Manager, CarID29) drove around the city from 2211 till midnight on 01/11/2014, and stopped near U-Pump from 2325 to 2355 without related transactions.

## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 1, day == 13, hour == 13)

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "IInteractive graph of CarID 1 on 01/13/2014 at 1300-1359")
## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 24, day == 13, hour == 12 | hour ==
        13)

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of CarID 24 on 01/13/2014 at 1200-1359")
## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 29, day == 11, hour == 22 | hour ==
        23)

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of CarID 29 on 01/11/2014 at 2200-2359")
  1. Truck Schedules

We noted that for most days across the two weeks analyzed, trucks are usually not used after 1700, with the exception of the below:

- On 01/13/2014, CarID 101 made 2  trips around the lower half of Abila, finally ending at Gastech at 1857.
- On 01/13/2014, CarID 104 was returning from Abila Airport at 1711, arriving at Gastech at 1717. 
- On 01/15/2014, CarID 107 departed Gastech for an area west of Maximum Iron and Steel arriving at 1727 and departed at 1813, arriving at Abila Hospital at 1825.
- On 01/17/2014, CarID 107 departed Nationwide Refinery for Abila Port, arriving at 1716 and departed almost immediately back to Gastech, arriving at 1721.
- CarID 101, 104, 105 and 106 were operating till late from 1700 to 2106 on 01/16/2014. This is unusual given that trucks are usually not used after 1700.
## Facet map- by day
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 101 | id == 104 | id == 105 | id == 106 | id ==
        107, timegroup == "17-20" | timegroup == "21-24")

tmap_mode("plot")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_facets(by = "day", ncol = 3)
Facet graph by day of the various truck trajectories during 1700-2359

Figure 22: Facet graph by day of the various truck trajectories during 1700-2359

  1. Unusual transactions at Kronos Mart

There are several transactions in Kronos Mart at 3am on 01/13/2014 and 01/19/2014, as seen from the datatable. By filtering the map for 3am, we noted that there were no cars near Kronos Mart. Hence, these transactions are either incorrectly timed or may even be fraudulent. Furthermore, we noted that the related trajectories are in the day before the actual credit card transaction:

DT::datatable(cc_data_cleaned_matched, extensions = "Scroller",
    filter = "top", fillContainer = T, options = list(scrollX = TRUE,
        autoWidth = TRUE, scroller = TRUE, scrollY = "450px"))
## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 32, day == 11 | day == 12)

tmap_mode("plot")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of CarID 32 on 01/11/2014 and 01/12/2014")

## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 23, day == 18 | day == 19)

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of CarID 23 on 01/18/2014 and 01/19/2014")
## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 1, day == 18 | day == 19)

tmap_mode("plot")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of CarID 1 on 01/18/2014 and 01/19/2014")
Facet graph of CarID 1

Figure 23: Facet graph of CarID 1

## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 10, day == 18 | day == 19)

tmap_mode("plot")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of CarID 10 on 01/18/2014 and 01/19/2014")

  1. Visits to Kronos Capitol

The following employees have made visits to Kronos Capitol over the two week period.

This is suspicious as the government sits at the Capitol and hence these employees might be liaising with the government on certain issues regarding environmental stewardship.

## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 35, day == 11, timegroup == "13-16" |
        timegroup == "17-20")

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of CarID 35 on 01/11/2014 at 1300- 1959")
## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 15 | id == 34 | id == 22 | id == 28,
        day == 18)

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of of various car trajectories on 01/18/2014")
  1. CarID 28 GPS trajectories

We noted that CarID 28 trajectories are chaotic. This car is being assigned to Isande Borrasca (Drill Technician). There may have been tampering made to the GPS tracking system.

## Interactive map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 28)

tmap_mode("view")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3,
    alpha = NA, saturation = 1, interpolate = TRUE,
    max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_layout(title = "Interactive graph of CarID 35 trajectory across 2 week period")
  1. Gathering at Desafio Golf Course

Several of the executives were congregated at the Desafio Golf Course on Sundays. Willem Vasco-Pais (Environmental Safety Advisor, CarID 35), Ingrid Barranco (SVP/CFO, CarID 4) and Ada Camp-Corrente (SVP/CIO, CarID 10) were present on 12/1/2014 at 1300-1400 and 19/1/2014 at 1200-1500. Sten Sanjorge Jr (President/CEO, CarID 31) and Orhan Strum (SVP/COO, CarID 32) joining during the second session.

## Facet map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 35 | id == 4 | id == 10, day == 12, hour ==
        12 | hour == 13 | hour == 14)

tmap_mode("plot")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_facets(by = "id", ncol = 3)
Facet graph of various executives' car trajectories on 01/12/2014, 1200-1459

Figure 24: Facet graph of various executives’ car trajectories on 01/12/2014, 1200-1459

## Facet map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 35 | id == 4 | id == 10 | id == 32 | id == 31,
        day == 19, hour == 12 | hour == 13)

tmap_mode("plot")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_facets(by = "id", ncol = 3)
Facet graph of various executives' car trajectories on 01/19/2014, 1200-1359

Figure 25: Facet graph of various executives’ car trajectories on 01/19/2014, 1200-1359

We also noted that Sten Sanjorge Jr (President/CEO) seems to not be in Abila for most of the two weeks observed (hence missing the first golf gathering with the other executives) as his GPS records only start from 01/17/2014 to 01/19/2014. On his return, he seems to be staying at Chostus Hotel. We have inferred this by tracking the trajectories of his movement, we noted that he tends to return to the hotel during afterwork hours. This is unusual as Sten Sanjorge Jr does not seem to have a home in Abila.

## Facet map
gps_path_selected <- gps_path_cleaned %>%
    filter(id == 31)

tmap_mode("plot")
tm_shape(bgmap) + tm_rgb(bgmap, r = 1, g = 2, b = 3, alpha = NA,
    saturation = 1, interpolate = TRUE, max.value = 255) + tm_shape(gps_path_selected) +
    tm_lines(col = "red") + tm_facets(by = "day", ncol = 3)
Facet graph of CarID 31

Figure 26: Facet graph of CarID 31

  1. Usage of trucks for personal use

By filtering the datatable below, We noted several transactions by truck drivers in eateries, which we assume are categorized as personal business. According to Gastech’s regulations, trucks cannot be used for personal business. The following transactions were made by errant truck drivers and deemed for personal use:

DT::datatable(cc_data_cleaned_matched, extensions = "Scroller",
    filter = "top", fillContainer = T, options = list(scrollX = TRUE,
        autoWidth = TRUE, scroller = TRUE, scrollY = "450px"))